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Abstract 

Researchers often calculate ratios of measured quantities. Specifying con- 
fidence limits for ratios is difficult and the appropriate methods are often 
unknown. Appropriate methods are described (Fieller, Taylor, special boot- 
strap methods). For the Fieller method a simple geometrical interpretation 
is given. Monte Carlo simulations show when these methods are appropriate 
and that the most frequently used methods (index method and zero-variance 
method) can lead to large liberal deviations from the desired confidence level. 
It is discussed when we can use standard regression or measurement error 
models and when we have to resort to specific models for heteroscedastic 
data. Finally, an old warning is repeated that we should be aware of the 
problems of spurious correlations if we use ratios. 



Ratios: Confidence limits & proper use 



3 



In a number of situations, researchers are interested in the ratio of two mea- 
sured quantities. For example, in bioassay researchers are interested in the 
potency of a drug relative to a standard drug (Finney, 1978). Similarly, 
whenever we calculate "percentage change" or "relative change" we calcu- 
late a ratio (Miller, 1986). Another example is inverse prediction in regression 
analysis. Assume researchers fit the linear model: yi = a + f3xi + (with 
% = 1, . . . ,n) and then want to predict at which xq to expect a certain yo 
value. This calculates as xq = (yo — a)//3 which is again a ratio of the random 
parameter estimates a and (5. Inverse prediction is often used in calibration 
procedures (cf. Kendall, Stuart, & Ord, 1991; Miller, 1986; Buonaccorsi, 
2001). 

Similar situations arise in psychology and in the neurosciences. Ratios 
of measured quantities have been calculated in the investigation of perceived 
speed (Hammett, Thompson, & Bedingham, 2000), perceived slant (Proffitt, 
Bhalla, Gossweiler, & Jonathan, 1995), distance perception (Emde, Schwarz, 
Gomez, Budelli, & Grant, 1998; Proffitt, Stefanucci, Banton, & Epstein, 
2003), visual discrimination performance (Watson & Robson, 1981), human 
motor control (Carrier, Heglund, & Earls, 1994; Serrien et al., 2002; Serrien 
& Wiesendanger, 2001; Turrell, Li, & Wing, 2001), psychological and bio- 
logical bases of stress, drug addiction, and emotion (Lees & Neufeld, 1999; 
Maes, Christophe, Bosmans, Lin, & Neels, 2000; Thomas, Beurrier, Bonci, 
& Malenka, 2001; Yamawaki, Tschanz, & Feick, 2004). 

Specifying confidence limits for ratios is a well-know problem in statistics 
with a number of unusual properties. The classic solution to this problem is 
called "Fieller's theorem" (Fieller, 1940, see also Fieller, 1944, 1954; Read, 
1983; Buonaccorsi, 2001) and is routinely used in a number of areas (e.g., 
in bioassay and health economics, cf. Finney, 1978; Briggs, O'Brien, & 
Blackhouse, 2002). Quite surprisingly, however, this issue seems to be largely 
unknown in psychology and the cognitive neurosciences. For example, none of 
the above cited studies used Fieller's method. Most studies unquestioningly 
used a method which I will call the "index" method and which turns out to 
require very specific assumptions about the distribution of numerator and 
denominator of the ratio. If these assumptions are not met, the method can 
lead to confidence limits with much too small coverage. Other studies used 
another ad-hoc method (the "zero-variance" method), which is even more 
problematic. 

The index method is closely related to the use of indices which are de- 
termined on a per observation basis and then processed further as if they 
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were normal observations. Examples are the body mass index (body weight 
divided by height squared) or income per capita (total personal income di- 
vided by total population). Indices are quite frequently used in medicine and 
in econometrics and have been in the focus of a long and heated controversy 
about spurious correlations (Pearson, 1897; Neyman, 1979; Kronmal, 1993), 
such that some caution is in order here. I will sketch the main problems and 
remedies. 

Before discussing the details of the different methods, let me describe 
the unusual problems posed by ratios. The main problem arises from the 
fact that the function y/x has a singularity at x — 0. Therefore, if the 
denominator is noisy and "too close" to zero the estimate for the ratio goes 
astray. This problem is so serious that the probability distribution of the 
ratio shows unusual behavior. For example, there neither exists the expected 
value nor the variance for the ratio if the denominator is normally distributed. 
We can only specify "pseudo" values for the expected value and the variance 
in cases where the denominator is "far" from zero. 

A further example for the unusual behavior of ratios is the Cauchy distri- 
bution. This occurs if, in addition to a normally distributed denominator, the 
numerator is also normally distributed (and both are independent and have 
an expected value of zero). The probability-density of the Cauchy distribu- 
tion looks like that of a normal distribution, but with heavier tails. Neither 
the expected value nor the variance exist for this distribution. Even worse, if 
we calculate the mean of independent, identically Cauchy-distributed vari- 
ables we find that the mean follows the same Cauchy distribution as each 
of the individual variables. That is, the mean is no more informative than 
any of the individual values (e.g., Johnson & Katz, 1970). This is in strong 
contrast to the "typical" behavior of random variables for which expected 
value and variance exist. Typically, calculating the mean of independent, 
identically distributed (i.i.d.) random variables leads to a decrease of the 
variance and therefore allows us to use the mean as a better estimate for the 
expected value. 

Given this unusual behavior, it does not seem surprising that we need 
special methods to deal with ratios. I will discuss these methods in four 
parts: 

The first part ( "The standard case" ) discusses confidence limits for ratios 
if numerator and denominator are normally distributed. In this part, I give 
a simple geometric description of the Fieller method, a discussion of alter- 
natives to Fieller's method, and of recent developments in the area of the 



Ratios: Confidence limits & proper use 



5 



bootstrap which allows to relax the assumption of normality. Also, I show in 
simulations under which conditions the often used index and zero-variance 
methods fail and to which extent this is relevant for the interpretation of 
existing studies. For this a number of sample studies are described and the 
variability of numerator and denominator in these studies is compared to the 
results of the simulations. (Details about the studies can be found in the 
supplementary material provided with this article). A short summary with 
recommendations is given at the end of this part. 

The second part ("When can we use regression methods?") views ratios 
as the special case of a linear model with zero intercept, such that the ratio 
corresponds to the slope. Also, we assume in this part homoscedastic data. 
That is, the variability of the numerator is assumed to be constant over the 
range of observations of the denominator. Linear models allow us to deal with 
more complex situations as, for example, the comparison of ratios. I discuss 
when we can use standard regression methods and when we have to use 
the more complicated measurement error models and show the relationship 
between Fieller method and measurement error models. 

The third part ("When can we use indices?") discusses which models 
are needed to justify the use of indices and of the index method. We will 
see that these models require a special form of heteroscedastic data with the 
numerator having larger variability at larger values of the denominator. 

The fourth part ("Beware: Spurious correlations and faulty ratio stan- 
dards" ) discusses the century-old problem of spurious correlations and faulty 
ratio standards. Although these problems could appear with any of the meth- 
ods discussed in the first three parts of the article, they are typically discussed 
in the context of indices. We will see that the central question is whether 
we are justified in assuming that the intercept of a linear model is zero (such 
that the ratio corresponds to the slope of the model) or whether we have to 
assume a non-zero intercept. 

At the end of the article, an overall summary is given which allows the 
practitioner to quickly decide which method is appropriate for the situation 
at hand. 
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The standard case 

Notation, assumptions, and point— estimate 

Let X, Y be random variables with expected values E(X) and E(Y) and 
the ratio of interest: p := E(Y)/E(X). Very often, we encounter the case 
of N paired measurements (xi,yi) with % = 1...N (assumed to be i.i.d.). 
When discussing the alternatives to Fieller's method, we will see that some 
of these methods are restricted to paired measurements. For simplicity, I will 
restrict most of the discussion to this important case (for generalizations of 
the Fieller method to independent samples with unequal variances see Wu & 
Jiang, 2001; Lee & Lin, 2004). 

Unbiased estimators for the expected values are the sample means x and 
y. Their variances and covariances are estimated by the usual estimators: 
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The coefficients of variation (CV) for the individual values are: CVx '■= j^jq 
and CVy : = ^yy- The CVs for the sample means are CV^ ■— j^jq and 



CVy '■= my)- 

We assume that (X, Y) is (approximately or exactly) distributed as bi- 
variate normal. Note, that due to the central limit theorem it is often a good 
approximation to assume the sample means to be normally distributed even 
if the individual values are not. For the bootstrap methods it is possible to 
relax the assumption of normality, see the discussion there. For generaliza- 
tions of the Fieller method to non-normal distributions, as for example the 
Gamma, Poisson, or Weibull distributions, see Cox (1967) and Wu, Wong, 
and Ng (2005). An intuitive point-estimate for the ratio of interest is: 

,-l (2) 

This estimator is often used in conjunction with the different methods to 
determine confidence limits, as described below (an exception is the index 
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method). As we are dealing with a ratio, this estimator shows unusual be- 
havior: First, neither expected value nor variance exist and the probability 
distribution is complex (cf. Marsaglia, 1965 and Hinkley, 1969, 1970). In 
cases where the denominator has a small CV we can specify "pseudo" values 
for expected value and variance. We do this by truncating the distribution 
such that the denominator cannot get close to zero. Second, the estimator 
is biased. This can be seen by performing a second order Taylor expan- 
sion on the ratio. Certain corrections have been proposed (Beale, 1962; Tin, 
1965; Durbin, 1959; Rao, 1981; Dalabehera & Sahoo, 1995) but in practical 
situations they do not seem to perform much better than the estimator in 
Equation (2) (Miller, 1986), such that they will not be discussed here. Both 
problems are attenuated with larger sample sizes because then the CV of the 
denominator gets smaller. 

Fieller method 

The central statistics of the Fieller method (Fieller, 1940, see also Fieller, 
1944, 1954; Read, 1983; Buonaccorsi, 2001) can be derived as follows: Be- 
cause the difference of normal variables is also normally distributed, the term 
y — px is normally distributed. Dividing this term by the appropriate esti- 
mator of the standard deviation gives us the statistics: 



which follows approximately or exactly a Student-t-distribution with df de- 
grees of freedom. 

In most cases this relationship is only approximate and the t-distribution 
corresponds to the normal distribution (with df = oo). The relationship is 
exact if the following conditions are met: (a) (X, Y) is exactly normally dis- 
tributed (b) the variance-covariance matrix is known up to a proportionality 
constant: a 2 (c) the proportionality constant is estimated by the estimator 
a 2 independent of (x,y), such that is distributed as chi-square with df 
degrees of freedom. In this case, the t-distribution has df degrees of freedom 
(cf. Buonaccorsi, 2001). 

To obtain confidence limits for p, we calculate the set of p values for 
which the corresponding T values lie within the (1 — a) quantiles of the 
t-distribution (denoted by t q in the following). This results in a quadratic 



T = 



y- px 



(3) 
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equation, the solution of which gives us three cases: (a) If the denominator x 
is significantly different from zero at significance level a (that is, if x 2 ja 2 ^ > 
tg), we obtain a normal confidence interval with the limits l\ and I2 ( "bounded 
case" ) : 

(xy- tpxy) ± yj{xy- tl&^y) 2 - ix 2 - t 2 q a^)(y 2 - t\a 2 Y ) 
ll / 2 = r 2 - t 2 fr 2 ^ 

If the denominator x is not significantly different from zero, we first need to 
calculate: 

2 x 2 (ya^-xaxy) 2 

''unbounded ±.2 -2 (±.2 ±.2 ±. 2^ V ' 

With this we can discriminate between: (b) If t 2 nbounded > t 2 , we obtain a 
confidence set which excludes only the values between li and I2, but all other 
values are included ("unbounded/exclusive" case), (c) If t 2 nbounded < t 2 , the 
confidence set does not exclude any value at all ("unbounded" case). 

This might seem as quite a complex behavior, but it is possible to present 
these results in a simple, geometrical fashion which is equivalent to Fieller's 
method (von Luxburg & Franz, 2004; Guiard, 1989; see also Milliken, 1982). 
For this, we depict X at the abscissa and Y at the ordinate of a coordinate 
system and draw a line from the origin to the estimates (x,y); as is shown 
in Figure la. The slope of this line corresponds to the ratio (|) and is 
graphically represented by the intersection of the line with a vertical line at 
X — 1. Now we need to determine the confidence limits for the ratio. 



Insert Figure 1 about here 



Because all points which lie inside the gray wedge project onto the same 
interval, all we need to do is to adjust the size of this wedge such that the 
appropriate confidence level for the ratio is achieved, von Luxburg and Franz 
(2004) showed that the wedge forms tangents to an ellipse centered at (x, y). 
The projection of the ellipse onto the abscissa corresponds to the marginal 
confidence interval of x, the projection onto the ordinate corresponds to the 
marginal confidence interval of y and the shape of the ellipse is determined 
by the covariance y- 

Using this geometrical method, we can assess the qualitative behavior of 
the Fieller confidence limits. If the denominator (x) is significantly different 
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from zero at a significance level of a, then the ellipse does not touch the 
y-axis and we get normal, bounded confidence intervals (Figure fa). Now 
assume the denominator is not significantly different from zero such that the 
ellipse touches the y-axis. In this case the result of the projection of the 
wedge onto the X = 1 line is unbounded: We either get a confidence set 
which exclude only a small part of all possible values (unbounded/exclusive 
case, see Figure fb; the arrows indicate that the confidence set is unbounded), 
or a confidence set which does not exclude any value at all (unbounded case, 
Figure fc). 

Unbounded confidence sets are certainly a puzzling result and some re- 
marks are necessary here: (a) for practical applications, we usually want 
bounded confidence intervals. A necessary and sufficient condition for this is 
that the (f — a) confidence interval of the denominator does not contain zero 
(which is equivalent to the denominator being significantly different from 
zero at a significance level of a), (b) if the denominator is not significantly 
different from zero, then its confidence interval allows values arbitrarily close 
to zero. In consequence, the ratio can assume arbitrarily large (or small) 
values and the confidence sets are unbounded. This implies that at the given 
confidence level we learned only little from our experimental data (in the un- 
bounded/exclusive case), or nothing at all (in the unbounded case). While 
this might be a discomforting result, it is a simple consequence of the ratio we 
are interested in and there is no way to force a different outcome. In fact, dif- 
ferent researchers (Gleser & Hwang, f 987; Koschat, f987; Hwang, 1995) have 
shown that any method which is not able to generate unbounded confidence 
limits for a ratio can lead to arbitrary large deviations from the intended 
confidence level (which 1 will call the "Gleser-Hwang theorem"). We will see 
that this theorem limits almost all of the alternatives to the Fieller method, 
except for a special bootstrap method (the Hwang-bootstrap method) and 
for the case that the true ratio is bounded away from zero. 

Note, that the unbounded confidence sets contribute to the overall perfor- 
mance of the method. That is, if in a certain situation there are on average, 
say f 0% unbounded confidence sets, these will count as being including the 
true ratio. For a discussion of the conditional confidence level, given that 
the Fieller confidence limits are bounded, see Buonaccorsi and Iyer (1984). 
This leads to an interesting problem: If we assume that we report a mea- 
sured ratio only if it has bounded confidence intervals, then we effectively 
use the conditional confidence level. This can, however, be arbitrarily low 
(this follows from the Gleser-Hwang theorem because this conditional pro- 
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cedure will never generate unbounded confidence limits; see also Neyman, 
1954, Tsao, 1998, and Read, 1983). One solution was proposed by Tsao and 
Hwang (1998) who suggest to estimate the confidence as 1 in the unbounded 
case and as 1 — a in the other cases (see also Kiefer, 1977). 

Alternative approaches 

In this section I give an overview of alternatives to Fieller's method as dis- 
cussed in the statistical literature or employed by previous studies. (I will 
not discuss Bayesian approaches here, because they are based on a different 
notion of confidence limits and a full treatment would go beyond the scope of 
this article. For application of Bayesian approaches to ratios see Mandallaz 
& Mau, 1981; Buonaccorsi & Gatsonis, 1988; Raftery & Schweder, 1993). 

Taylor method 

The Taylor method (sometimes also called delta-method) calculates a linear 
approximation for the sample estimates: 



Because the approximation is linear, it is easy to calculate confidence limits 
for this function if we again assume that (X, Y) is bivariate normally dis- 
tributed. The approximate confidence limits (denoted by l\ and I2) are sym- 
metric relative to the sample estimates (x, y) and will never be unbounded: 



The Taylor approximation has virtues because the linear function is math- 
ematically easy to handle. However, the approximation will fail for the 
"problematic" cases, when the denominator is close to zero (this is to be 
expected by the Gleser-Hwang theorem because the Taylor limits are never 
unbounded). But, if the denominator has a small CV the Taylor method 
provides a serious alternative to the Fieller method. We will see in the sim- 
ulations that the Taylor method is a very good approximation in these cases 
(cf. Cox, 1990; Polsky, Glick, Willke, & Schulman, 1997; Gardiner, Huebner, 
Jetton, & Bradley, 2001). 
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Bootstrap methods 

The bootstrap (Efron, 1979; Efron & Tibshirani, 1993) is a general purpose 
method which allows to determine confidence limits in an easy and consis- 
tent way, even for very complicated statistics. It uses the measured sample 
as a basis for re-sampling with the goal to create an approximation to the 
population distribution. For our ratio problem with N paired measurements, 
bootstrap methods would draw a large number of samples (with replacement) 
from the set of the measured values (xj, yi). Each sample has the same size as 
the original sample and we would calculate for each sample the ratio |. The 
distribution of these re-sampled ratios (the "empirical distribution") is the 
basis for the calculation of the confidence intervals. In the simplest case, the 
confidence intervals are the (1 — a) percentiles of the empirical distribution 
("percentile method"). Other methods perform certain corrections, most 
notably the widely used BC a method ("bias corrected and accelerated"). 
These standard bootstrap methods can provide an alternative to approxi- 
mative solutions as the Taylor method, especially in cases where (X, Y) is 
not normally distributed (Chaudhary & Stearns, 1996; Polsky et al., 1997; 
Briggs, Mooney, & Wonderling, 1999; Briggs et al., 2002). 

However, standard bootstrap methods face two problems when dealing 
with ratios: (a) Bootstrap confidence limits can be erroneous if the variance 
of the statistic does not exist as in the case of ratios (Athreya, 1987; Knight, 
1989). (b) Because bootstrap confidence limits cannot result in unbounded 
confidence limits the Gleser-Hwang theorem applies such that there can be 
arbitrary large deviations from the intended confidence level for ratios. We 
will see in the simulations that this problem occurs if the denominator is 
close to zero. 

Hwang (1995) showed that these problems can be overcome by a special 
bootstrap method which does not perform the bootstrap on the ratio directly, 
but on T in Equation (3). The method first uses the bootstrap to deter- 
mine the (1 — a) quantiles of T and then proceeds as the Fieller method 
does (i.e., solves the quadratic equation). Depending on the result of the 
quadratic equation, this method can produce unbounded confidence limits 
and is therefore the only alternative to Fieller's method which is not lim- 
ited by the Gleser-Hwang theorem. We will see in the simulations that the 
Hwang-bootstrap method performs as well as the Fieller method if the sam- 
ple sizes are large enough. In addition, Hwang (1995) showed that for non- 
normal distributions with non-zero skewness, the Hwang-bootstrap method 
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is superior to Fieller's method: The Fieller method is only first order cor- 
rect, with the coverage converging as 0(1 /VN) against the desired coverage, 
while the Hwang-bootstrap method is second order correct, converging as 
0(1/ N) (see also Hall, 1986, 1988 for first/second order correctness). While 
this qualifies the Hwang-bootstrap method as an excellent alternative to the 
Fieller method, it should also be clear that the standard bootstrap methods 
will not always fail. The Hwang-bootstrap is, however, more general and it 
is therefore safer to use this method than the standard bootstrap methods. 

Index method 

This method can only be applied to the special case of iV paired measure- 
ments. The idea is to determine the ratio for each of the N subjects individ- 
ually: 

r l = V ~ (8) 

Xi 

From these individual ratios (or "indices" ) the mean fl and standard error are 
calculated. Assuming that the mean is approximately normally distributed, 
confidence limits are calculated. 

The index method is used very often (almost all of the example studies in 
the supplementary material provided with this article used this method). We 
can justify the method in the context of a linear model if the denominator 
is bounded away from zero and if the data have a specific heteroscedastic 
structure, such that the the numerator has larger variability at larger values 
of the denominator. This model will be discussed in the section 'When can 
we use indices?" . 

Because the method is used so often and because it seems unlikely that 
the data in all these cases show the specific heteroscedastic structure (the 
studies typically do not report having tested for this), I will first discuss 
what happens if the method is applied to bivariate normal data. We will see 
that in this case the method can lead to large deviations from the desired 
confidence level. Also, if the mean ratio f~l is used as point-estimate for p it 
shows systematic biases and can be much more variable than the ratio of the 
means y/x. 
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Zero— variance method 

Some studies (e.g., Glover & Dixon, 2001a, 2001b, 2002a, 2002b; Haf- 
fenden, Schiff, & Goodale, 2001; Danckert, Nadder, Haffenden, Schiff, & 
Goodale, 2002) estimated the variability of the ratio by dividing the individ- 
ual numerator- value of each subject by the overall mean of the denominator 
(calculated across all subjects): 



From these individual ratios the mean and standard error were calculated. 
It is easy to show that this procedure is equivalent to dividing the mean of 
the numerator and its standard error by x, such that we get as estimates for 
the mean ratio p = I and its standard error: d p = An inspection of the 
formulas shows that this procedure does not take into account the variability 
of the denominator. Clearly, this is problematic. To justify this approach, 
we would have to assume that the measured denominator corresponded to 
the true value of the denominator in the population such that the variability 
of the denominator were zero (for this reason, I call this approach the zero- 
variance approach). In consequence, the zero-variance approach will often 
underestimate the variability of the ratio and will therefore result in liberal 
statistical tests. 



A numerical example 

Before investigating the methods systematically, I will give a simple example. 
The example is taken from a study by Pang, Gao, and Wu (2002), the only 
study I found to provide enough detail to calculate the confidence limits 
using most of the different methods (because the sample sizes were small, the 
bootstrap would not make sense here and is omitted). The study reported 
four different ratios (which I denote with "PI", "P2", "P3", "P4") using 
the index method. For more details about the data see the supplementary 
material provided with this article. Also, as a tutorial example, raw data 
and results are given for one of the ratios in Table 1, such that it is possible 
to reconstruct the calculations. 

Insert Table 1 & Figure 2 about here 

Figure 2 a.-d. shows the 95% confidence limits according to each of the 
methods. There are large differences: In some cases, all alternatives to the 
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Fieller method lead to a much smaller width of the confidence intervals than 
the Fieller method. For example, in case "PI" the Fieller method gives an 
upper limit of 498, while all other methods estimate upper limits below 10. 
The discrepancy occurs because the denominator is just about significantly 
different from zero. If there were slightly more variability in the denominator, 
then it would not be significantly different from zero and the Fieller limits 
would be unbounded. This can be seen if we use the geometric construction 
method for the Fieller limits (Figures 2e. and f.). In case "PI" the ellipse 
almost touches the y-axis (which would result in unbounded limits). 

The discrepancy suggests that the coverage of the alternative methods 
is smaller than intended. We will see in the simulations that this is indeed 
often the case. Of course, the alternative methods are not always as bad 
as in case "PI". This can be seen in case "P2" where all methods lead to 
similar results. 

Simulations 

We saw that the alternatives to Fieller's method likely can lead to a much 
smaller coverage than intended. But how general is this problem? I will 
present the results of Monte Carlo simulations which used known distribu- 
tions of (X,Y). All methods were applied to the simulated data, and the 
percentage of simulation runs in which the confidence limits contained the 
true values was determined. For a 95% confidence level, we expect that the 
confidence limits contain the true value in 95% of the simulation runs, while 
in 5% the confidence limits should not contain the true value (i.e., be signifi- 
cantly different from the true value). A liberal construction method will lead 
to a higher percentage of significant results, while a conservative construction 
method will lead to a lower percentage. 

Methods 

Each simulation is described in terms of the sample size N of the paired 
measurements and the CVs of numerator and denominator. For simplicity 
the correlation is assumed to be zero, such that we explore a 3-dimensional 
parameter-space (CVx, CVy, N). This space is covered across typical ranges 
in the Figures 3 and 4. The use of the CVs allows us to compare the results of 
different studies with the simulations (see the data points which are plotted 
on top of the Figures 3 and 4). More details on these studies are given in the 



Ratios: Confidence limits & proper use 



15 



supplementary material provided with this article. Note that the simulations 
are based on a correlation of zero which will not be the case in the example 
studies. However, further simulations showed that the results are essentially 
identical for a wide range of correlation coefficients (—.99 . . . +.99), such that 
this choice is not critical. 

The random number generation for the normally distributed numerator 
and denominator was performed using an algorithm described by Kinderman 
and Ramage (1976) as implemented in the free data analysis language R 
(R Development Core Team, 2004). 95% confidence limits were calculated 
according to each of the methods, and the coverage of the true ratio p was 
determined. 

As "standard" bootstrap method I used the BC a method as described in 
Davison and Hinkley (1997) and implemented in the R-boot package (the 
S original was implemented by Angelo Canty, the R-port by Brian Rip- 
ley). Results for the percentile method were similar to the BC a method and 
are therefore not shown. The number of bootstrap replications was always 
B = 2000. For the Hwang-bootstrap method, I performed a BC a bootstrap 
on To in Equation (3). Because the Hwang-bootstrap method is relatively 
new, I describe it in more detail here: We have a sample of paired mea- 
surements (xi, Hi) with i — 1 . . . N and want to bootstrap T = T (x, y, p) in 
Equation (3). For this, we generate B bootstrap-samples. Each bootstrap- 
sample consists of N pairs drawn with replacement from the original sample 
(xi,yi). For each bootstrap-sample, we calculate the means (x*,y*). Follow- 
ing Hwang (1995, p. 163/164 and p. 170), we use these bootstrap means 
to determine the empirical distribution of T * = T (x* ,y* , p). Based on this 
empirical-distribution we determine the quantiles of Tq and then proceed as 
with the Fieller method. Note, that the Hwang-bootstrap method (as well 
as the Fieller method) is not restricted to paired measurements, but can also 
be adapted to the case of independent observations (Fieller, 1954; Hwang, 
1995; Lee & Lin, 2004). In this case, the denominator of T will be different 
to reflect the different estimate for the standard deviation of y — px. 

For the index method, additional calculations were performed using 
trimmed means and winsorized standard deviations as described by Tukey 
and McLaughlin (1963). Trimming was always 25% (cf. Rosenberger & 
Grasko, 1983). 
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Results & Discussion 

Figure 3 shows the results of the simulations for small sample sizes (N = 20). 
The empirical confidence levels of the Fieller method are not shown be- 
cause they are always close to 95%. The Hwang-bootstrap method performs 
equally well, while most other methods are only accurate if the CV of the 
denominator is small. The zero-variance method fails if the CV of the de- 
nominator is larger than that of the numerator. This leads to large deviations 
from the desired confidence level even in cases where the denominator has a 
small CV. Therefore we should not use the zero-variance method. 



Insert Figure 3 about here 

In Figure 3 all other methods are accurate if the denominator is typically 
significantly different from zero (this is left of the solid, vertical line). We 
might be tempted to infer from this that as soon as the denominator is 
significantly different from zero, all these methods are accurate. However, 
this is not the case for the index method, as can be seen if we increase the 
sample size. 

Figure 4 shows the results for larger sample sizes (N = 500). The area 
where the denominator is typically significant (again: left of the vertical solid 
line) is now larger and stretches further to the right. Accordingly, the area 
where the bootstrap (BC a ) and Taylor methods are accurate also stretches 
further to the right. 

Insert Figure 4 about here 

For the index method, however, this area moved to the left. That is, by 
increasing the sample size we loose accuracy in this method. This problem 
occurs right in the area where most of the example studies are located (as 
indicated by the points plotted on top of the Figures 3 and 4). A closer 
look shows that there is a band of correct confidence levels for denominator 
CVs of about 1. We will see that left of this band the index method usually 
overestimates the value of p and right of this band the index method under- 
estimates it. These biases lead to the deviations from the desired confidence 
level. 

This can be seen in Figure 5 which shows the results of 40 simulations 
as error-bar plots. The simulations were performed left of the band (point 
"A"), in the band (point "B"), and right of the band (point "C"). We expect 
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about 2 significant deviations from the true ratio (given 95% confidence limits 
and 40 simulation runs) and this is what we find for Fieller method, both 
bootstrap methods, and the Taylor method. This is not surprising because 
we are for all these methods in unproblematic areas where the denominator is 
typically significant (because the results were similar, only the results for the 
Fieller method are shown; significant deviations are denoted by exclamation 
marks in the lower part of the figure). 

Insert Figure 5 about here 

For the index method, however, we can see two things: (a) The results are 
much more variable than with the other methods. This is due to the fact that 
the index method uses the individual ratios — as the basis for calculating the 
estimator. For some of these individual ratios, the denominator will "hit" the 
problematic region around zero and this will lead to huge deviations in either 
positive or negative direction. The other methods do not have this problem 
because they first reduce the variability of the denominator by calculating its 
mean, (b) There is first a tendency to overestimate the ratio p (point "A"), 
then the estimate is noisy but balanced (point "B"), and finally there is a 
systematic underestimation (point "C"). 

Note, that the biases cannot be eliminated by using trimmed means and 
winsorized standard deviations. Trimming excludes systematically a certain 
percentage of the most extreme values from the statistics (Tukey & McLaugh- 
lin, 1963; Dixon & Tukey, 1968; Rosenberger & Grasko, 1983). Further sim- 
ulations showed that trimming does indeed reduce the huge variability of the 
point estimator, but does not reduce the bias and therefore does not lead 
to better confidence limits, as can be seen in the corresponding panels in 
Figures 3 and 4. 

In summary, the index method can fail badly if applied to bivariate normal 
data even in situations in which the denominator is significantly different 
from zero. In these situations, the Taylor and standard bootstrap methods 
both perform gracefully, while they fail if the denominator is not significantly 
different from zero. The zero-variance method fails if the denominator has 
larger variability than the numerator, while Hwang-bootstrap and Fieller 
methods never fail. 
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Recommendations for the standard case 

Based on the previous discussion we can issue the following recommenda- 
tions for the "standard" case that the means of numerator and denominator 
are approximately normally distributed (cf. Table 2): Fieller method and 
Hwang-bootstrap method can generally be used, with the Hwang-bootstrap 
method having advantages if there are deviations from normality. If the 
denominator is clearly significantly different from zero [CV^ < 1/3 if 95% 
confidence limits are intended), we can also use Taylor and the standard 
bootstrap methods. Note, that with sample sizes smaller than N — 15 the 
bootstrap methods (including the Hwang-bootstrap) lead to slightly smaller 
coverage than intended (for sake of brevity these simulations are not shown). 

Insert Table 2 about here 



Index method and zero-variance method are problematic and should not 
be used. This does not mean that studies which used the index or zero- 
variance methods necessarily need to be wrong. For both method there 
are areas in the Figures 3 and 4 where these methods lead to the intended 
confidence level. For the index methods this is the case if the CV of the 
denominator is so small that the individual denominator values will hardly 
ever get close to zero (for 95% confidence limits this corresponds to CVx < 1 
and CV X < 0.03 for N = 20 and iV = 500, respectively). For the zero- 
variance method this is the case if the CV of the numerator exceeds the CV 
of the denominator. Also note that the index method can be appropriate if 
the data show a specific form of heteroscedasticity, see the section "When 
can we use indices?". 

When can we use regression methods? 

We can view a ratio as the slope of a linear relationship with zero intercept. 
Therefore the question arises whether we could use standard regression meth- 
ods to estimate the ratio and its confidence limits — which would be easier 
and more flexible than the methods discussed so far. And indeed, this is 
sometimes possible. However, we have to be careful about the assumptions 
we make. We will see that the most critical question is whether there is 
error in the measurement of the regressor (corresponding to the denominator 
of the ratio). Depending on this, we might have to use the more complex 
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measurement error models instead of standard regression models. In the first 
part of this section I will describe under which conditions we can choose re- 
gression models and in the second part I will describe situations which can 
be dealt with by regression methods. 

Measurement error models vs. standard regression 

To give an overview, I will describe the linear model in a form which allows for 
measurement error in the response as well as in the regressor (cf. Madansky, 
1959; Fuller, 1987; Schaalje & Butts, 1993; Buonaccorsi, 1994, 1995; Cheng 
& Van Ness, 1999). Consider a regression model on true values: 

Vi — a + [3ui + Ci (10) 

with (ui,Vi) being the true values of the paired measurements (xi,yi). The 
error is often called "error in the equation" and assumed to be i.i.d. with 
zero mean and constant variance. To specify the model, we also need to know 
whether the Ui are random (the structural case) or whether they are fixed 
(the functional case, cf. Kendall, 1951, 1952; Dolby, 1976). 

Often, there is measurement error (or "error in the variables"), such that 
the observed values do not correspond to the true values (ui,Vi). 

Typically, the errors are assumed to be additive: 

Xi = Ui + Ci (11) 

yi = Vi + di 

with the measurement errors q and di each assumed to be i.i.d. with expected 
values zero and all being uncorrelated with the Ui and the of Equation (10). 
Using this model we can discuss the standard regression model, as well as 
measurement error models. 

First, assume that the true values can be observed exactly such that 
Xi = Ui and j/j = Vi (i.e., q and di are zero and with zero variance). This 
results in the model: 

Hi = a + (3xi + Ci (12) 

This is the classic regression situation and we can use standard regression 
methods in both, the structural as well as the functional cases (e.g, Madan- 
sky, 1959; Sampson, 1974). 
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Second, assume that there is measurement error in the response Vi (i.e., 
di has non-zero variance), while we still can observe the regressor exactly 
such that Xi = U{. This results in the model: 

Ui = a + (3xi + d + di (13) 

This model is similar to the classic regression model of Equation (12) and 
indeed we can use standard regression methods if we assume that di has con- 
stant variance. However, the estimation of the parameters can be improved 
if we have information about the measurement error, which is typically ob- 
tained by repeated measures on the same subject. With this information, it 
is also possible to account for non-equal variances of di (Buonaccorsi, 1994, 
1995), as well as for non-additive errors (Buonaccorsi, 1989, 1996). 

Third, assume that the true values cannot be observed exactly (i.e., q and 
di have non-zero variance). In this case we have to use measurement error 
models. There is a large variety of different models. Standard textbooks 
are Fuller (1987) and Cheng and Van Ness (1999). Here, I will sketch two 
variants: (a) a "classic" structural errors-in-variables model. This model is 
interesting because it shows the typical issues related to measurement error 
models as well as the connection to the Fieller method, (b) the "Berkson 
model of a control experiment", which offers an alternative solution if we 
have control over Xi and which allows us to use standard regression methods. 

For the "classic" structural errors-in-variables model assume that there 
is no "error in equation" (i.e., is zero with zero variance) and that the mea- 
surement errors q are uncorrelated with the measurement errors d^. Also, 
assume that the true values u,i and the measurement errors are normally dis- 
tributed. These assumptions create a bivariate normal distribution for the 
pair of observable variables (xi,yi). The model has three important prop- 
erties: (i) If we ignored the measurement error and used a standard regres- 
sion procedure, this would lead to a downward bias in the estimate for the 
slope f3. This bias is often called "attenuation" or "regression dilution" (cf. 
Spearman, 1904; Schmidt & Hunter, 1996; DeShon, 1998; Frost & Thomp- 
son, 2000; Charles, 2005). The importance of this issue can be seen by the 
fact that attenuation was a key argument in a recent discussion on seman- 
tic priming in Psychology (Greenwald, Draine, & Abrams, 1996; Draine & 
Greenwald, 1998; Dosher, 1998; Klauer, Draine, & Greenwald, 1998; Miller, 
2000; Klauer, Draine, & Greenwald, 2000). Note, however, that attenuation 
is only a problem if we estimate a or j3. If we only want to predict y given a 
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certain x then we can use standard regression procedures. Also, if the mea- 
surement errors are correlated it is possible that there is not attenuation but 
that the slope is overestimated by standard regression procedures (Schaalje 
& Butts, 1993). (ii) The model is nonidentifiable as long as we don't have 
additional information about the error-variances such that we cannot obtain 
a unique solution (cf. Reiersol, 1950; Madansky, 1959). This additional in- 
formation can, for example, be the ratio of the variances of the measurement 
errors q and di which could be estimated by repeated measure methods, (iii) 
If the intercept is zero, the nonidentifiability problem disappears and the ap- 
propriate solution is the Fieller method. This shows the connection between 
measurement error models and Fieller method. 

In the "Berkson model of a control experiment" we assume that we have 
control over Xi , even though we cannot measure the corresponding true values 
accurately. This enables us to observe yi at fixed, predefined a^-values. If 
we assume that the measurement errors q have zero mean, then we get a 
model which is quite different from the classic measurement error model: 
While in the classic measurement error model the true values U{ and the 
measurement errors q are uncorrelated, now the Ui and the q are perfectly 
negatively correlated. Berkson (1950) showed that in this case we can use 
standard regression methods (see also Madansky, 1959; Fuller, 1987; Cheng 
& Van Ness, 1999). For a discussion of this model in the context of repeated 
measure designs with multiple subjects, see Buonaccorsi and Lin (2002). 

In summary, we can use standard regression methods if: (a) we can mea- 
sure the Xi very accurately, (b) we do not estimate a or (3, but only want 
to predict y, given a certain x. (c) we have control over x iy even though we 
cannot measure the corresponding true value accurately ( "Berkson model of 
a controlled experiment"). 

Application of regression methods to ratios 

With regression models the situation is simpler and we can apply standard 
methods as are described in most textbooks. This easily allows us to estimate 
a, (3 and the corresponding confidence intervals. If we assume the intercept 
a to be zero, then (3 corresponds to our ratio of interest. Regression methods 
can, if applicable, also help us with more complicated situations. For exam- 
ple, if we want to compare two or more ratios obtained in different groups we 
can use the analysis of covariance (ANCOVA). For this we set up the model: 

Dgi = Pg x gi + e gi (14) 
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with x gi , y gi being the values obtained in group g = l...m for participant 
% = l...n; f3 g the ratios of interest and e g i the errors. Standard ANCOVA 
methods then allow to decide whether the ratios are different (Miller, 1986). 

Note, however, that even in situations in which we can use regression 
methods we sometimes need Fieller's method. This is the case if we want to 
calculate ratios of the parameters estimated by regression methods, as for ex- 
ample in the inverse prediction described in the Introduction. Another classic 
example is the slope ratio assay (Finney, 1978). Here, researchers first cal- 
culate an ANCOVA as described in Equation (14), but then are interested 
in the ratio of two f3 g estimates (typically indicating the effectiveness of a 
drug relative to a standard drug). Again, they need Fieller's method for this 
ratio of regression parameters. In general, it is possible to calculate Fieller 
confidence limits for linear combinations of parameters of general linear mod- 
els (Zerbe, 1978), generalized linear models (Cox, 1990), and mixed-effects 
models (Young, Zerbe, & Hay, 1997). For examples of such applications see 
Buonaccorsi and Iyer (1984) and Sykes (2000). 

Ratios of estimated parameters can also occur in the context of nonlinear 
regression models (e.g., Bates & Watts, 1988) and nonlinear mixed effects 
models (e.g., Davidian & Giltinan, 1995; Vonesh & Chinchilli, 1997; Pinheiro 
& Bates, 2002). These models are designed to deal with general nonlinear 
problems and therefore can also deal with ratios. In addition, the nonlinear 
mixed effects models can handle repeated measure data, as are frequent in 
psychological and biological research. Typically, these models perform a 
linear approximation at the point of the estimated parameters and therefore 
can fail in a similar way as the Taylor method discussed in this article. But, 
if the denominator of the ratios have small CVs, these models will provide 
an elegant solution such that it can be beneficial to reformulate a statistical 
problem in terms of a nonlinear model (see also Cox, 1990). 

When can we use indices? 

Linear models as described in the previous section assume that the residual 
errors are constant over the range of observations ("homoscedastic"). Some- 
times this is clearly not the case (the errors are "heteroscedastic"). Two 
classes of models can improve this situation and lead to an interest in in- 
dices. Both models use standard regression methods, such that as soon as 
the models are specified the specification of confidence limits for the ratio of 
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interest pose no additional problems. For simplicity of presentation, I will 
assume in the following that the denominator of the ratios is bounded away 
from zero such that it cannot attain values close to zero. This is often the 
case in situations in which indices are used (cf. Belsley, 1972). Without this 
assumption we cannot justify the use of indices. Also, I assume that the 
regressor can be measured with negligible error, such that we don't need to 
use measurement error models. 

First, consider the case that we want to fit a structural regression model 
to our data: 

Ui = a + fixi + e M (15) 

Assume that the residual errors are heteroscedastic. This can lead to 
serious deviations from the desired confidence level if we used standard re- 
gression methods to determine confidence limits for a and (3. Often it is 
possible to correct for the heteroscedasticity by using weighted least squares 
analysis (e.g., Miller, 1986). Sometimes, it turns out that the errors are pro- 
portional to the absolute size of Xi such that the yi spread out with larger 
X{. In this case, we can use a special variant of weighted least-squares anal- 
ysis and divide the whole equation by Xi (Kuh & Meyer, 1955; Firebaugh & 
Gibbs, 1985; Kronmal, 1993): 

- = a-+[3 + e 2:l (16) 

Xi Xi 

If now the assumptions of regression models are met, most notably that 
the new error term e 2 ,i = &i,i/xi is homoscedastic and normally distributed, 
then we can determine confidence limits using standard regression methods 
(with Di/xi and 1/xj being response and regressor, respectively). Note, that 
although this method uses indices, it estimates the same parameters a and 
P as the standard linear model in Equation (15) (Firebaugh & Gibbs, 1985). 
It is also possible to have more than one regressor; an example is given in the 
next section ("Beware: Spurious correlations and faulty ratio standards") 
in Equation (25). These models are often used in econometrics, where the 
ratios are often called "deflated variables" and the denominator "deflator". 
For a discussion of non-random denominators see Belsley (1972) and for a 
discussion of the case with measurement error in the denominator see Casson 
(1973). 

Using this model we can also justify the index method, if we assume that 
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a in Equation (16) is zero: 

- = P + e t (17) 

with (5 being the ratio of interest and being the error, typically assumed 
to be i.i.d. as normal. Note the specific heteroscedastic structure we have to 
assume to justify this method. 

Second, consider an allometric or power function model (Kleiber, 1947; 
Sholl, 1948; Nevill, Ramsbottom, & Williams, 1992; Nevill & Holder, 1994, 
1995a; Dreyer & Puzio, 2001): 

y i = Pxle i (18) 

With (xi,yi) being the observed values, j3 and 7 the parameters, and the er- 
ror term. Sometimes the parameters can be estimated by log-transformation 
to a log-linear model. This results in: 

log(yi) = log(P) + ilog(xi) + log(ei) (19) 

If the assumptions of regression models are met for the log-linear model, 
most notably that log{ei) is homoscedastic and normally distributed, then 
we can use standard regression methods on Equation (19) to determine the 
confidence limits of log(j3) and 7. 

Allometric models can also incorporate more than two variables. A good 
example is given by Nevill et al. (1992) who showed that for recreational 
runners the 5-km run speed (zi) is well predicted by an index of maximum 
oxygen uptake (y,) and body mass (x«). For this, they used the allometric 
model: 

Zi = yf xf e, (20) 
and fitted it with log-linear regression. As result, they obtained the fit: 

1.01 

* = 84 - 3 ±m (21) 

Because the exponents are close to one, the fit contains essentially an index, 
such that in this case the use of an index seems warranted. Also, the model 
turned out to be superior to linear models and it seems biologically plausible 
that performance is affected by oxygen uptake relative to body mass. (For 
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a further discussion of allometric models and the relation to indices see also 
Nevill & Holder, 1995a, 1995b; Kronmal, 1993, 1995). 

In summary, there are two models which make accepted use of index 
variables: The linear model with correction for heteroscedasticity by division 
and the allometric model. Both models assume that the denominator is 
bounded away from zero and heteroscedastic structures of the data, with the 
Hi spreading out in a fan-like fashion with larger Xi values. 



Beware: Spurious correlations and faulty ratio 
standards 



Spurious correlations are a famous and much discussed problem (e.g., Pear- 
son, 1897; Kronmal, 1993; McShane, 1995; Nevill & Holder, 1995b; Kron- 
mal, 1995). We will see that spurious correlations can occur if numerator 
and denominator of a ratio are linearly related with non-zero intercept and 
inappropriate methods are used, typically involving indices. 

Consider that we are interested in the relationship between two measure- 
ments i/i and Zi, but want to "correct" for the effect of a third variable x,. 
A famous, hypothetical example was given by Neyman (1952): A researcher 
relates the number of babies to the number of storks in a number of differ- 
ent counties. Because larger counties inhabit more women and consequently 
more babies (and more storks), the researcher wants to correct for the number 
of women. Table 3 gives a simplified version of the data. 



First, consider the accepted way to do the correction: For this we use 
partial regression analysis and set up a restricted model and a full model: 



With i/i being the number of babies, Xi the number of women, and Zi the 
number of storks; Xi and Z{ assumed to be random and measured with negli- 
gible error. We can think of partial regression as a two-step process: We first 
fit the restricted model (22). This model is designed to linearly predict the 
babies (j/j) based on the number of women (xj). In the second step, we de- 
termine how much the fit is improved if we use the full model (23) which also 



Insert Table 3 about here 



Vi = ctfuii + PfvMXi + 7Zi + e 2 ,i 



(22) 
(23) 
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includes the number of storks (zj). (If z were a factor, we would replace 7^ 
by parameters indicating the effects at the different factor levels. This would 
correspond to an ANCOVA; cf. Maxwell, Delaney, & Manheimer, 1985). 
Visual inspection of the example data in Table 3 shows that the number of 
babies is almost perfectly predicted by the number of women, such that the 
addition of the storks does not improve the fit significantly. Therefore, we 
conclude that number of storks has no influence on the number of babies. 

This standard partial regression analysis assumes again that the errors e^j 
and e2,j are homoscedastic. If the errors are heteroscedastic and scale with 
the size of x iy we can apply the correction discussed in the section "When 
can we use indices?" and divide both equations by Xj. This results in: 

— = OL reat Vf3 Test + e^i (24) 

Xi Xi 

— = oif uU — + Pfuu + 1— + e 4 ,i (25) 

Xi Xi Xi 

If now the errors are homoscedastic, we can proceed as before with the stan- 
dard regression methods. Note, that these corrected models estimate the 
same parameters a rest , (Xfuii, Prest, and (3f u u as the models (22) and (23) 
(Firebaugh & Gibbs, 1985). 

Now, consider the problematic way to do the correction: Here, we simply 
divide the number of babies (j/j) and storks (zi) by the number of women (xj) 
and then investigate the linear relationship between these individual ratios. 
This results in: 

- = (3fuii + l- + e 5 , (26) 

Xi Xi 

Typically, 7 is tested against zero. This corresponds to a comparison of the 
full model (26) with the restricted model: 

— = firest + e 6 ,i (27) 
Xi 

Comparing these models to the partial regression models with correction for 
heteroscedasticity in the Equations (24) and (25) shows that the models are 
equivalent if we assume that a rest and in Equations (24) and (25) are 
zero. Because a rest and otf u u correspond to the intercepts in Equations (22) 
and (23) this means that we assume the intercepts of the linear relationships 
between yi and Xi to be zero. This assumption of zero intercept is at the core 
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of the debate about spurious correlations (Kuh & Meyer, 1955; Firebaugh, 
1988). The problem is that if the intercepts deviate from zero then the 
correction does not work properly. 

In our stork example, this can be seen in Table 3: The birth-rate Ui/xi 
is highly and significantly correlated with the stork-rate Zi/xi, such that 
based on this problematic analysis we would conclude that there is a strong 
dependence. This dependence, however, is only "spurious" and is generated 
by the fact that the intercepts in the Equations (22) and (23) are not zero 
(if we extrapolate the data, there are yo ~ 10 babies at xo = women). 

Now, one might argue that there is indeed theoretical reason to assume 
that the intercept should be zero: Obviously, if there are no women, there 
cannot be any babies (McShane, 1995, but see: Kronmal, 1995). However, 
this zero-point can easily be obtained if the data are non-linear beyond the 
range of observations. An every-day example would be the fuel-efficiency 
of cars. For longer distances, the fuel consumed is linear to the distance 
traveled. For short distances, however, this linearity breaks down because 
here cars need an disproportionate large amount of fuel. Therefore, we would 
be wrong if we compared the fuel efficiency (expressed as a ratio: miles per 
gallon or liters per 100 kilometers) of one car that was used for short distances 
with that of another car that was used for long distances. 

Of course, the assumption of zero intercept is not always wrong. But, 
given all the potential problems involved if it is violated it should be carefully 
tested or there should be serious theoretical reasons to assume a linear model 
with zero intercept. There are ample examples of studies which likely fell 
prey to spurious correlations (cf. Kuh & Meyer, 1955; Kronmal, 1993). 
Also note that the problematic method described above relies on a second 
strong assumption, namely the assumption of heteroscedasticity with the 
errors scaling with the size of X{. This should also be tested. A good example 
of a careful model-test which also considered the potential heteroscedasticity 
of the data is the study of Nevill et al. (1992) on the ratio of maximum oxygen 
uptake and body mass in recreational runners (see the section "When can 
we use indices?"). 

A problem closely related to spurious correlations is the problems of faulty 
ratio standards. If ratios are used to define a medical standard for the "nor- 
mal" or average human, and if the data have non-zero intercept, then this 
standard can lead to serious biases. For example, Tanner (1949) showed that 
stroke volume of the heart is linearly related to body weight with positive, 
non-zero intercept. Because, however, the average ratio was used as stan- 



Ratios: Confidence limits & proper use 



28 



dard, the average lightweight person was automatically above the standard 
and the average heavy person was automatically below the standard. Tanner 
(1949) gives a large number of further illustrative examples and concludes 
that many patients classified as having deviant values "may have been suf- 
fering from no more formidable a disease than statistical artefact" (p. 3). 

In summary, spurious correlations and faulty ratio standards can occur if 
we use ratios on data which are linearly related, but with non-zero intercept. 
In the literature, this problem is typically discussed together with the use of 
indices, but it is not restricted to indices. The use of indices only adds the 
additional assumption of heteroscedasticity with the errors scaling with the 
size of the denominator. 

Summary and Conclusions 

Ratios of measured quantities pose unusual statistical problems. When deal- 
ing with ratios we should first clarify whether we are justified in using a ratio, 
that is whether the numerator can safely be assumed to be a linear function 
of the denominator with zero intercept. Otherwise we should better use a 
linear model with non-zero intercept; see the section "Beware: Spurious cor- 
relations and faulty ratio standards" . If a ratio is appropriate, we can use the 
methods summarized in Table 2 and describe in the section "The standard 
case" . 

Sometimes, we can simplify the calculations by using standard regression 
methods. This is typically the case if the denominator can be measured with 
negligible error; see the section "When can we use regression methods?". 
Regression methods can help us also in more complicated cases in which, 
for example, we want to compare ratios or have repeated measure data. If 
it turns out that the residuals are heteroscedastic and if the denominator is 
bounded away from zero, index method or allometric models can be potential 
remedies, see the section "When can we use indices?" . 

This shows that the wide use of the index method (illustrated by the 
example studies provided with this article) rests on very specific and likely 
often problematic assumptions. The simulations in the section "The standard 
case" show that the index method can lead to large deviations from the 
desired confidence level if these assumptions are not met. Also, the point 
estimate closely associated with this method (i.e., the mean ratio) can lead 
to systematic biases and much more variable estimates than the ratio of the 



Ratios: Confidence limits & proper use 



29 



means. Therefore we should not use the index method as long as there is no 
indication for the specific heteroscedastic structure assumed by this method. 
A simple, straightforward alternative for cases in which one might be tempted 
to use the index method (i.e., if the denominator is bounded away from zero) 
is the Taylor method. For this, all the the researcher needs to do is to use 
Equation (7) instead of the index method. Of course, all the other methods 
described in Table 2 would also be viable alternatives (most notably, standard 
bootstrap methods if there are deviations from normality). If, however, the 
denominator is not bounded away from zero, we best use either the Fieller 
method or the Hwang-bootstrap method. 
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Figure Legends 

Figure 1: Qualitative behavior and geometric construction of the Fieller confi- 
dence limits for p. The confidence limits (indicated by the thick, solid 
vertical lines) can be constructed using a wedge which forms tangents 
to an ellipse centered at (x,y). The size of this ellipse is such that 
its projection onto the abscissa corresponds to the marginal confidence 
interval of E(X), the projection onto the ordinate corresponds to the 
marginal confidence interval of E(Y) and the shape of the ellipse is 
determined by the covariance y- If the denominator is significantly 
different from zero at a significance level of a, then the ellipse will not 
touch the ordinate and the (1 — a) confidence limits will be bounded 
(left panel). If the denominator is not significantly different from zero, 
the ellipse will touch the ordinate and the confidence limits will be un- 
bounded (middle and right panel, the arrows indicate infinity). In the 
unbounded/exclusive case, we still can exclude a small interval (the 
dashed vertical line in the middle panel) while in the unbounded case, 
we can not exclude any value at all (right panel). 

Figure 2: a.— d. Comparison of the 95% confidence limits for the example data 
from Pang et al. (2002), as calculated by the Fieller, Taylor, index 
and zero-variance methods. The study reported four different ratios 
in four different conditions (denoted here by "PI", "P2", "P3", "P4"). 
For PI, the upper Fieller limit is 498 (which is beyond the upper limit of 
they-scale). e.— f. The geometrical construction method applied to the 
conditions PI and P2. At PI the construction ellipse just about touches 
the y-axis, which leads to an almost infinite upper Fieller limit. If the 
ellipse touched the y-axis, we would get unbounded/exclusive Fieller 
limits. 

Figure 3: Empiric confidence levels of the different methods for small sample 
sizes (N = 20). The empiric confidence levels of the Fieller method 
are not shown, because they are of course always close to the expected 
95%. The empiric confidence levels are color-coded. For example, light 
gray corresponds to an empiric confidence level between 90% and 99%. 
Left of the solid vertical line the denominator is typically significantly 
different from zero. (This is achieved by depicting CVx = j^jq = 0.5 
which corresponds to CV X = jP^; = 2.2 at the abscissa). The dotted 
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diagonal line indicates equal CVs of numerator and denominator. In 
each panel the CVs that were reported in a number of example studies 
are plotted as single data points (see also the supplementary material 
provided with this article). At the CVs indicated with "A", "B", "C", 
and "D" further simulations were run, see Figure 5. 

Figure 4: Empiric confidence levels for N = 500. As in Figure 3, the denominator 
is typically significantly different from zero left of the solid vertical line. 
(This is achieved by depicting CV^ = 'Wncj = ^-5 which corresponds to 
CV X = -j^jq = 11.2 at the abscissa). For further detail, see Figure 3. 

Figure 5: Results of simulations at the points "A" "B" and "C" (as are shown 
in the Figures 3 and 4). Each plot shows the results of 40 simulation- 
runs, ordered by the magnitude of the estimated ratio (|). In each 
run N = 500 subjects are simulated and analyzed using the Fieller and 
index methods (the results of Hwang-bootstrap, Taylor, and bootstrap 
BC a were practically identical to the results of the Fieller method and 
are therefore not shown). For 95% confidence limits, we expect that 
about 2 of the 40 simulation runs are significantly different from the 
true value (as indicated by the exclamation marks in the bottom of 
each plot). For the points "B" and "C" the rightmost panel shows the 
results of the index method at a larger scale. Despite this larger scale, 
the estimated ratio of one simulation was still beyond the scope of the 
scale. This values is indicated numerically (42). The results at point 
"D" are similar to the results at point "C" and are therefore not shown. 
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Figure 2 
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Table 1: Results of the example calculation for point "PI" in Figure 2 



Note. Pang et al. (2002) measured the following pairs of values: = 
(4.87,8.30,11.66) and x, t = (6.34,4.02,2.88). Calculated are 95% confidence 
limits, based on the quantiles of the Student-t-distribution t q (df = 2) = 
±4.3027. For further details see the supplementary material provided with 
this article. 



method 



lower limit estimate upper limit 



Fieller 
Taylor 
Index 
Zero-variance 



-0.02 1.88 498.75 

-1.88 1.88 5.64 

-1.81 2.29 6.39 

-0.03 1.88 3.79 



Ratios: Confidence limits & proper use 



Table 2: Standard methods to calculate confidence limits for ratios 
distribution of (X, Y) further restrictions adequate method 



bivariate normal 
bivariate normal 
not necessarily normal 
not necessarily normal 



CVx < 1/3 
iV > 15 

N > 15; CVx < 1/3 



Fieller 
Taylor 

Hwang-bootstrap 
Standard bootstrap 



Note. The restrictions are meant as rules of thumb and apply to the 
that we are interested in 95% confidence limits. 
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Table 3: Hypothetical example for spurious correlations 
county women (xj) babies (yj) storks (,%) birth-rate (^) stork-rate (^) 



1 


1 


15.8 


3.2 


15.8 


3.2 


2 


2 


20.2 


4.1 


10.1 


2.1 


3 


3 


25.4 


5.6 


8.5 


1.9 


4 


4 


30.1 


6.3 


7.5 


1.6 



Note. Women are xlOOOO. 
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Supplementary material: Details about the 
cited studies 

This supplement gives a short description of the studies for which the CVs 
are shown in the Figures 3 and 4. The exact values for numerator and 
denominator variability and the sample sizes are also shown in Table 4. If 
specified, I describe the method which was used to calculate confidence limits 
(or SEM, which are typically interpreted as 68% confidence limits). 

Capuron et al., 2003: Data are from Table 1, p. 909. The data de- 
scribe the ratio of Kynurenine (KYN) to Tryptophan (TRP) during inter- 
feron (IFN)-a therapy: KYN/TRP. Method used: Index method (p. 908) 
Background: TRP degradation into KYN by the enzyme, indoleamine-2,3- 
dioxygenase, during immune activation may contribute to development of 
depressive symptoms during IFN-ct therapy. 26 patients with malignant 
melanoma had received IFN-a treatment and received in parallel either an 
antidepressant (paroxetine) or placebo. Conditions: Antidepressant free pa- 
tients vs. paroxetine-treated patients, measured at treatment initiation, 
weeks 2, 4, and 12. 

Franz, 2003: Data are from Table 2, p. 219 and from my own records. The 
data describe the effect of an illusionary change of object-size relative to the 
effect of a physical change of object-size at different times of a reach to grasp 
movement. The ratio (illusion-effect)/(physical-size-effect) was calculated 
for each time-point. The study is also discussed in Franz (2004) and Franz, 
Scharnowski, and Gegenfurtner (2005), where also re-calculations using the 
correct Fieller-method are given. Method used: Index method. Background: 
The planning/control model of action (Glover & Dixon, 2001a; Glover, 2004, 
2002) assumes that grasping is sensitive to certain illusionary changes of 
object-size only in early stages of the movement (planning), but not in later 
stages (control). In consequence, the relative effects of these illusions should 
decrement during a grasping movement. Conditions: Grasp aperture mea- 
sured at start of movement (t = 0%), at the time of the maximum grip apter- 
ture (f = 100%), and at intermediate times (t = 25%, t = 50%, t = 75%). 

Maes et al., 2000: Data are from Table 1, p. 912. The data describe 
the u;6/a;3 polyunsaturated fatty acids (PUFAs) ratios. Method used: Index 
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method on ^-transformed scores (p. 912). Background: Psychological stress 
in humans induces the production of proinflammatory cytokines. An imbal- 
ance of 006 to uj3 PUFAs in the peripheral blood causes an overproduction 
of proinflammatory cytokines. The study examined whether an imbalance 
in u6 to u3 PUFAs in human blood predicts a greater production of proin- 
flammatory cytokines in response to psychological stress. Conditions: lv6/uj3 
ratios a few weeks before (PRE) and after (POST) as well as one day before 
(STRESS) a difficult oral examination. Participants were also divided into 
groups with low/high fatty acid status. 

Marsland, Henderson, Chambers, &z Baum, 2002: Data are from 
Table 1, p. 867. Data describe the ratio of T-helper (CD4+) cells to T- 
suppressor/cytotoxic (CD8+) cells: CD4+/CD8+. Method used: Not de- 
scribed. Background: To explore the stability of immune reactivity in hu- 
mans, the study assessed lymphocyte responses to a speech task and a mental 
arithmetic task. Dependent measure was (beside others) the ratio of CD4+ 
to CD8+ cells. Conditions: Mental arithmetic task, Speech task, Baseline 
performances. 

Metzger et al., 2002: Data are from Table 2, p. 54. Data describe the 
amplitude of P50 event-related brain potentials in response to two auditory 
clicks (i.e., second click amplitude/first click amplitude) Method used: In- 
dex method. Background: Individuals with post traumatic stress disorder 
(PTSD) have been found to show several event-related brain potential abnor- 
malities including P50 suppression. Female Vietnam nurse veterans with and 
without current PTSD completed P50 paired-click tasks: Two clicks were 
presented and the amplitude of the P50 for each of the clicks was determined. 
Conditions: Current PTSD versus never PTSD. 

Miiller, Rau, Brody, Elbert, &t Heinle, 1995: Data are from Table 1, 
p. 74. Data describe the relationship of low density lipoprotein (LDL) choles- 
terol to high density lipoprotein (HDL) cholesterol: LDL/HDL. Method used: 
Not described. Background: The relationship between habitual anger coping 
styles, especially anger expression in a socially assertive manner and serum 
lipid concentrations was assessed. The LDL /HDL ratio was analyzed because 
it provides a predictor for coronary heart disease. Conditions: Two groups: 
male versus female. 
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Olincy et al., 2000: Data are from Table 2, p. 972. Data describe the 
amplitude of P50 event-related brain potentials in response to two auditory 
clicks (i.e., test P50 amplitude/conditioning P50 amplitude). Method used: 
Index method (p. 972) Background: Attention-deficit/hyperactivity disorder 
(ADHD) and schizophrenia are both conceptualized as disorders of attention. 
Failure to inhibit the P50 auditory event-evoked response, extensively stud- 
ied in schizophrenia, could also occur in ADHD patients, if these two illnesses 
have common underlying neurobiological substrates. The study examined 
the inhibition of the P50 auditory event-evoked potential in unmedicated 
adults with ADHD, schizophrenic outpatients, and normal control subjects. 
Auditory stimuli were presented in a paired stimulus, conditioning-testing 
paradigm. The ratio of the test to the conditioning response amplitudes 
were observed. Conditions: Three groups: unmedicated adults with ADHD, 
schizophrenic outpatients, and normal control subjects. 

Pang et al., 2002: Because this study gave excellent details about the 
data, I could use it as an example in this article. For simplicity (and because 
we are only interested in the statistical properties of the data), I used the 
following aliases for the conditions: 



alias 


cell 


parameter 


source 


PI 


OFF 


A 9CL /A gc (NR) 


Table 1, p. 24 


P2 


ON 


A gc JA gc {NR) 


Table 1, p. 24 


P3 


OFF 


Qc(P + S + I)/Q c (NR) 


Table 2, p. 25 


P4 


ON 


Qc(P + S + I)/Q c (NR) 


Table 2, p. 25 



Also, for simplicity, I used the absolute values in the cases P3 and P4 (val- 
ues are negative in the original study. Using the absolute values does not 
change anything for our analysis). Background: Pang et al. (2002) in- 
vestigated the relative contributions of bipolar and amacrine cell input to 
light responses of 3 and 5 retinal ganglion cells. Two of the ratios describe 
the light-evoked changes in chloride conductance relative to the cation con- 
ductance A gcL /A gc (NR) in normal Ringer's (NR) solution, the other two 
ratios describe the light-evoked charge transfer in picrotoxin + strychnine + 
Imidazole-4-acidic acid (P+S+I) relative to NR: Q C (P + S + I)/Q C (NR). 
Method used: Index method. Note: In the calculation of the variances, 
Pang et al. divided sometimes by N, (describing the sample variability) 
and sometimes by iV — 1 (estimating the population variability). For con- 
sistency, I always used N — 1 in my calculations. Background: Light-evoked 
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postsynaptic currents (lePSCs) were recorded from ON, OFF and ON-OFF 
ganglion cells in dark-adapted salamander retinal slices under voltage clamp 
conditions, and the cell morphology was examined using Lucifer yellow fluo- 
rescence with confocal microscopy. The charge transfer of lePSCs in NR and 
in P+S+I was compared. 

Richter, Hinton, Meissner, & Scheller, 1995: Data are from Table 1, 
p. 133/134. Data represent salivary [K+]/[Na+] ratios. Method used: Index 
method with log-transformed data (p. 137). Background: It was hypothe- 
sized that choice reaction-time testing would cause salivary [K+]/[Na+] to 
increase. Relative contributions of [K+] and [Na+] to ratio changes were 
investigated in 23 hypertensives and 10 hospital staff. Changes in post-rest 
and post-test ionic concentrations and [K+]/[Na+] were investigated. Con- 
ditions: 5 conditions: day 1 (relaxed), day 2 (pre-test), unpaced RT task, 
paced RT task, post-test (rest); Two groups: Hypertensives and control 
group. 

Schumann et al., 1998: Data are from Table 2, p. 1374. They repre- 
sent the ratio of cerebral blood flow (CBF) to cerebral blood volume (CBV), 
as measured by PET: CBF/CBV. Method used: Index method and non- 
parametric tests (p. 1371). Background: Local cerebral perfusion pressure 
(CPP), a crucial parameter that should allow a better assessment of the 
haemodynamic compromise in cerebrovascular diseases, is not currently mea- 
surable by non-invasive means. Experimental and clinical studies have sug- 
gested that the regional ratio of cerebral blood flow to cerebral blood volume 
(CBF/CBV), as measured by PET, represents an index of local CPP in focal 
ischaemia. The study was designed to evaluate further the reliability of the 
CBF/CBV ratio during manipulations of CPP by deliberately varying mean 
arterial pressure (MAP) in the anesthetized baboon. Cortical CBF, CBV, 
cerebral metabolic rate for oxygen and oxygen extraction fraction were mea- 
sured by PET in 10 anesthetized baboons. Conditions: Five baboons (Group 
A) underwent four PET examinations at different levels of MAP: base line, 
moderate hypotension, minor hypotension, profound hypotension. Five other 
baboons (Group B) were subjected to hypertension and were compared with 
their base line state. 
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Serrien &: Wiesendanger, 2001: Data are from Table 1, p. 419. Data 
present the ratio of grip force to load force during grasping: grip-force /load- 
force. Method used: Not described. Background: The study examined in- 
terlimb interactions of grasping forces during a bimanual manipulative as- 
signment that required the execution of a drawer-opening task with the left 
hand and an object-holding task with the right hand. The grip/load-force 
ratio of the bimanual task was compared with the unimanual performance in 
order to investigate the coordinative constraint between grip and load force. 
Conditions: Unimanual versus bimanual; object holding versus drawer open- 
ing. 

Sloan et al., 1994: Data are from Table 2, p. 93. Data present the ratio 
of low (LF) to high (HF) frequency bands of heart period variability (HPV): 
LF/HF. Method used: Mixed effect regression model on log-transformed data 
(p. 92). Background: The study investigated changes in cardiac autonomic 
control during psychological stress in ambulatory subjects. 24-h electrocar- 
diographic recordings of 33 healthy subjects were analyzed for heart period 
variability responses associated with periodic diary entries measuring physi- 
cal position, negative affect, and time of day. A total of 362 diary entries were 
made during the 24-h sessions, each in response to a device which signaled 
on an average of once per hour. HPV was analyzed in the frequency domain, 
yielding estimates of spectral power in low and high frequency bands, as well 
as the LF/HF ratio. Conditions: Standing, sitting, reclining positions. 

Willemsen, Carroll, Ring, & Drayson, 2002: Data are from Ta- 
ble 1, p. 225. Data describe the ratio of T-helper (CD4+) cells to T- 
suppressor /cytotoxic (CD8+) cells: CD4+/CD8+. Method used: Not spec- 
ified. Background: To examine gender differences in immune reactions to 
stress and relationships between immune and cardiovascular reactivity, mea- 
sures of cellular and mucosal immunity and cardiovascular activity were 
recorded in 77 men and 78 women at rest and in response to active (mental 
arithmetic) and passive (cold pressor) stress tasks. Conditions: Two groups: 
Men versus Women; mental arithmetic, rest, and cold pressor. 
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Table 4: Illusion effects and corrected illusion effects of the example studies. 



study/simulation 


condition 


N 


y 


CTy 


X 






(7 X /X 


plot 


Capuron 03 


antidcp-frcc init 


15 


1.6 


0.5 


35.9 


8.4 


0.312 


0.234 


y 


Capuron 03 


antidep-free week 2 


15 


3.7 


1.4 


30.4 


10.5 


0.378 


0.345 


y 


Capuron 03 


antidep-free week 4 


15 


2.8 


1.1 


30.7 


11.1 


0.393 


0.362 


y 


Capuron 03 


antidep-free week 12 


15 


2.8 


0.8 


38 


6.7 


0.286 


0.176 


y 


Capuron 03 


Paroxetin init. 


11 


1.3 


0.5 


33.5 


9.2 


0.385 


0.275 


y 


Capuron 03 


Paroxetin week 2 


11 


3.5 


1.3 


30.8 


10.8 


0.371 


0.351 


y 


Capuron 03 


Paroxetin week 4 


11 


2.8 


0.9 


31.4 


7.3 


0.321 


0.232 


y 


Capuron 03 


Paroxetin week 12 


11 


2.9 


0.9 


32.7 


8 


0.310 


0.245 


y 


Franz 03 


t=0 


26 


0.226 


0.612 


0.011 


0.062 


2.708 


5.636 


y 


Franz 03 


t=25 


26 


0.333 


1.16 


0.292 


0.219 


3.483 


0.750 


y 


Franz 03 


t=50 


26 


0.974 


1.443 


0.765 


0.315 


1.482 


0.412 


y 


Franz 03 


t=75 


26 


1.278 


1.8 


1.04 


0.302 


1.408 


0.290 


n 


Franz 03 


t=100 


26 


1.474 


1.927 


1.119 


0.324 


1.307 


0.290 


y 


Maes 03 


Low (PRE) 


17 


28.97 


3.41 


2.82 


0.87 


0.118 


0.309 


y 


Maes 03 


Low (STRESS) 


17 


29.85 


2.22 


3.15 


1.26 


0.074 


0.400 


y 


Maes 03 


Low (POST) 


17 


29.95 


6.93 


3.1 


1.55 


0.231 


0.500 


y 


Macs 03 


High (PRE) 


10 


33.93 


0.93 


5.45 


0.64 


0.027 


0.117 


y 


Maes 03 


High (STRESS) 


10 


33.25 


2.9 


5.67 


1.33 


0.087 


0.235 


y 


Maes 03 


High (POST) 


10 


32.92 


1.4 


5.49 


1.25 


0.043 


0.228 


y 


Marsland 02 


Arithmetic Baseline 


31 


705 


314 


388 


175 


0.445 


0.451 


y 


Marsland 02 


Arithmetic Task 


31 


699 


302 


393 


173 


0.432 


0.440 


n 


Marsland 02 


Speech Baseline 


31 


719 


314 


396 


168 


0.437 


0.424 


n 


Marsland 02 


Speech Task 


31 


736 


314 


449 


210 


0.427 


0.468 


n 


Metzger 02 


P50 current 


24 


1.78 


1.72 


4.52 


2.52 


0.966 


0.558 


y 


Metzger 02 


P50 never 


24 


1.74 


1.58 


4.98 


2.33 


0.908 


0.468 


y 


Miiller 95 


Males 


53 


188.85 


34.65 


54.98 


12.22 


0.183 


0.222 


n 


Miiller 95 


Females 


33 


115.58 


39.34 


57.61 


22.78 


0.340 


0.395 


y 


Olincy 00 


Schizophrenia 


16 


2.53 


1.58 


1.53 


0.85 


0.625 


0.556 


y 


Olincy 00 


ADHD 


16 


2.08 


1.21 


0.66 


0.88 


0.582 


1.333 


y 


Olincy 00 


Normal 


16 


2.61 


1.57 


0.5 


0.65 


0.602 


1.300 


n 


Pang 02 


PI 


3 


8.277 


3.396 


4.413 


1.763 


0.410 


0.400 


n 


Pang 02 


P2 


5 


8.162 


2.31 


3.228 


0.623 


0.283 


0.193 


y 


Pang 02 


P3 


3 


224 


68 


46 


13 


0.304 


0.283 


y 


Pang 02 


P4 


5 


278 


105 


48 


36 


0.378 


0.750 


y 


Richtcr 95 


Hypertensive day 1 


23 


18.1 


8.8 


6.2 


3.3 


0.486 


0.532 


n 


Richtcr 95 


Hypertensive day 2 


23 


25.7 


12.1 


6.4 


3.7 


0.471 


0.578 


n 


Richtcr 95 


Hypertensive unpaced RT 


23 


34.7 


15.5 


7 


4.4 


0.447 


0.629 


y 


Richtcr 95 


Hypertensive paced RT 


23 


36.2 


14.6 


6.8 


3.3 


0.403 


0.485 


n 


Richtcr 95 


Hypertensive rest 


23 


29.8 


11.4 


6.2 


2.8 


0.383 


0.452 


n 


Richtcr 95 


Normals day 1 


10 


34.2 


8.7 


8.2 


1.7 


0.254 


0.207 


n 


Richtcr 95 


Normals day 2 


10 


41.7 


11.8 


6.5 


1.1 


0.283 


0.169 


n 


Richtcr 95 


Normals unpaced RT 


10 


53.8 


23.6 


7.5 


2 


0.439 


0.267 


n 


Richtcr 95 


Normals paced RT 


10 


50.7 


21.7 


7.3 


2.2 


0.428 


0.301 


n 


Richter 95 


Normals rest 


10 


46.4 


12.2 


7 


1.5 


0.263 


0.214 


n 


Schumann 98 


Hypotension Baseline 


5 


31.1 


3.9 


3.15 


0.71 


0.125 


0.225 


y 


Schumann 98 


Moderate Hypotension 


5 


27.5 


4 


3.61 


1.09 


0.145 


0.302 


y 


Schumann 98 


Minor Hypotension 


5 


24.7 


3.5 


3.2 


0.5 


0.142 


0.156 


y 


Schumann 98 


Profound Hypotension 


5 


19.7 


4.9 


3.63 


0.55 


0.249 


0.152 


y 


Schumann 98 


Baseline Hypertension 


5 


27.5 


2 


2.72 


0.22 


0.073 


0.081 


y 


Schumann 98 


Hypertension 


5 


36.1 


2.4 


2.84 


0.11 


0.066 


0.039 


y 


Serricn 01 


Onset Drawer U 


6 


20.31 


4.52 


13.92 


3.34 


0.223 


0.240 


n 


Serrien 01 


Impact Drawer U 


6 


24.53 


4.04 


15.32 


3.18 


0.165 


0.208 


n 


Serrien 01 


Onset Drawer B 


6 


21.34 


3.84 


14.52 


3.5 


0.180 


0.241 


n 


Serrien 01 


Impact Drawer B 


6 


23.72 


4.35 


15.24 


3.25 


0.183 


0.213 


n 



Ratios: Confidence limits & proper use 



60 



Table 4 (continued) 


study / simulation 


condition 


N 


y 


(Tii 

y 


X 


(7 X 


y l f 


<j x 1 X 


plot 


Serrien 01 


Static Object U 


6 


10.44 


0.9 


8.01 


0.02 


0.086 


0.002 


n 


Serrien 01 


Onset Object B 


6 


11.82 


1.15 


8.02 


0.02 


0.097 


0.002 


n 


Serrien 01 


Impact Object B 


6 


12.4 


1.25 


8.01 


0.01 


0.101 


0.001 


n 


Sloan 94 


Standing 


96 


1210.4 


984.2 


243.6 


619.82 


0.813 


2.544 


y 


Sloan 94 


Sitting 


191 


1354.5 


973.5 


518.5 


613.07 


0.719 


1.182 


y 


Sloan 94 


Reclining 


28 


1392.8 


1205.93 


559.8 


759.38 


0.866 


1.357 


y 


Willemsen 02 


Men Rest 


78 


626 


170 


419 


154 


0.272 


0.368 


n 


Willcmsen 02 


Men Artithmetik 


78 


589 


143 


423 


154 


0.243 


0.364 


n 


Willemsen 02 


Men Cold pressor 


78 


560 


144 


405 


154 


0.257 


0.380 


n 


Willemsen 02 


Women Rest 


79 


722 


230 


446 


155 


0.319 


0.348 


y 


Willemsen 02 


Women Artithmetik 


79 


694 


203 


473 


184 


0.293 


0.389 


n 


Willcmsen 02 


Women Cold pressor 


79 


691 


211 


448 


171 


0.305 


0.382 


n 


Point A 




500 




0.1 


1 


0.15 


0.100 


0.150 


y 


Point B 




500 




0.1 


1 


0.75 


0.100 


0.750 


y 


Point C 




500 




0.1 


1 


3 


0.100 


3.000 


y 


Point D 




500 




1.5 


1 


3 


1.500 


3.000 


y 



Note: The column "plot" indicates whether the data point is plotted in the Figures 3 and 4. 



